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In this article we analyse linear correlation and non-linear dependence of traded volume, v, of 
the 30 constituents of Dow Jones Industrial Average at different value scales. Specifically, we have 
raised v to some real value a or /3, which introduces a bias for small (a, /3 < 0) or large (a, /3 > 1) 
values. Our results show that small values of v are regularly anti- correlated with values at other 
scales of traded volume. This is consistent with the high liquidity of the 30 equities analysed and the 
asymmetric form of the multi-fractal spectrum for traded volume which has supported the dynamical 
scenario presented by us. 

PACS numbers: 05.45.Tp — Time series analysis; 89.65.Gh — Economics, econophysics, financial markets, 
business and management; 05. 40. -a — Fluctuation phenomena, random processes, noise and Brownian 
motion. 

Keywords: Financial market; Traded volume; Correlation; Nonextensivity 



I. INTRODUCTION 



Financial market analysis has become one of the most significative examples about application of concepts associated 
with physics to systems that are usually studied by other sciences Q. In this sense, ideas like scale invariance and 
cooperative phenomena have also found significance in systems that are not described neither by some Hamiltonian 
nor some other kind of equation usually associated with Physics {e.g., a master equation). Although plenty of work 
has been made on the analysis and mimicry of price fluctuations, less attention has been paid to an important 
observable intimately related to changes in price, the traded volume, v In fact, traded volume has been coupled 
to price fluctuations both on an empirical or analytical way for some time Q. Nonetheless, a consistent analysis of 
intrinsic statistical properties of traded volume appears to be first presented in Reference Thereafter, it has been 
enlarged or revisited by different authors @, @, 0)l3|- In this article, we apply a generalisation of the traditional linear 
self-correlation function in order to study how small, large, and about average (frequent) values of v relate between 
them in time. Furthermore, we analyse non-linear dependence using a generalised measure based on Kulback-Leibler 
mutual information. Our data set is made up of 1 minute traded volume time series, running from the 1** July 2004 
to the 31^* December 2004, for the 30 equities that make the Dow Jones Industrial Average index. Aiming to avoid 
the well-known intraday profile, traded volume time series were previously treated according to a standard procedure 
(see e.g. 0). 

II. GENERALISED LINEAR SELF-CORRELATION FUNCTION 

The (normalised) correlation function, generally, 

C iA ir, t) , B ir-, t')) ^ (A jr,!) B jr^ ,,)) - (Ajr^t)) jB jr^ ,t')) 

y (a (r, tf) {A {r, t)f^(B {r', t'f) {B (P, f)f 

represents a useful analytical form to evaluate how much two random variables depend, linearly, on each other. 
Leaving out spatial dependence, when A and B are the same observable, Eq. ([T|) represents the straightforwardest 
way to appraise memory in the evolution of A. In any case, it does not give us any information about the role of 
magnitudes. Inspired by multi- fractal analysis a simple way to quantify this type of correlation can be defined by 
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TABLE I: Values of the parameters of adjust of Ca.jS (A) for a double exponencial, / (x) = aexp |^~:^J + bexp |^^:^J for the 
results presented on the panels of Fig. [1] 
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introducing a, generalised self- correlation function, C yA{t) , A {t')j = Ca.p (A), where A{t) — \A{t)\°' , A{t) = \A{t)f 

(with a, /? 7^ G 5R), and t' = t + r ^. As an example let us assume /3 = 1. For values of a greater than 1, small values 
of A become even smaller and their weight in the value of Ca.p {A), due to A{t) ,A {t'), approaches neghgibility {e.g., 
when a = 2, V ^ 10^^ > t'" = 10^^ and v = 10 < v" = 10^). Otherwise, when a is negative, we highlight values 
around zero {e.g., when a — —1, v = 10~^ < = 10"'^ and t; — 10 > = 10~^). In the end, after summing over all 

pairs ^yA {t) , A {t')^ , we verify that the main contribution for Ca,i {A) comes from large values of \A {t)\ when a > 1 

and from small values of \A {t)\ when a < 0. Accordingly, for a = (3, we estimate how values of the same order of 
magnitude are related in time, when a ^ (3 we analyse the relation between values with different magnitudes. 

In Fig. [T] we depict the results that we have obtained by applying Eq. ([T]), with different pairs of (a, (3) in traded 
volume time series. In TableUwe present the values of the numerical adjustment of Ca,i3 {A) for a double exponential 
function, 
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We have set as minimum and maximum values for the exponents —1 and 2. Our choice is justified by the fact they 
are both able to evaluate the influence of small and large values of v, and to preserve a reliable statistics. 

From the analysis of figures in Table U we observe that small values, a{f3) = — 1, are always anti-correlated with 
both frequent {a{(3) = 1), and large {a{(3) = 2) values of traded volume. We verify that there is temporal symmetry, 
which can be checked if we change a When a{P) equal —1, the second scale of relaxation is consistently much 

larger than the observed when both exponents are positive. In addition, the values for coefficients a and b (in modulus) 
are smaller when at least one of the exponents is —1. This indicates that, besides presenting a negative influence over 
frequent and large values of v, such an influence is restrictable. On the other side, we have observed a very fast first 
decay of the correlation function for [3 = 2 and a = 1,2 followed by a slower decay, though faster when compared 
with a = (3 = 1. This might be interpreted as a consequence of the low frequency in large values of v. This richness 
and disparity in behaviour for small and large values is congruous with a previous multi-fractal analysis of v 0. In 
this analysis, it has been observed a strong asymmetry in multi-fractal spectrum, that has been associated with the 
existence of different dynamical mechanisms prompting small and large values for trading volume 0, [lo| . 



III. GENERALISED MUTUAL INFORMATION APPLIED TO DJ30 TRADED VOLUME TIME SERIES 

In information theory, the KuUback-Leibler (KL) mutual information [ll| (or information gain, or information 
divergence) is a distance measure (but not a metric distance) that provides the mean change of information related to 
any two probability distributions, p and p' . If we have, say, two experiments, with a given set of discrete outcoming 



Hereon A (t) is assumed to be a stationary time series. The dependence on the waiting time, t represents an indication of non-stationarity 
in the signal. 
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FIG. 1: Ca,i3{v) vs. T for the values of a and /3 presented in Table |T] (clockwise). On each panel black symbols are the values 
obtained from time series and the grey line represents the numerical fit of Ca,i3{v) for a double exponential. In panel for 
C_i,i(w) the curves for T > and T < concur, which goes along the lines of time symmetry. The inset on Ci,i(«) panel is 
a log-log representation of the main panel. As it is visible the correlation function does not present power-law behaviour. The 
same happens for all the other values of (q, jS) studied. 

probability distributions p and p' , respectively, then, the KL mutual information might be defined as, 



K{p,p') = -Y,p,\n^, 
Pi 



(2) 



where pj (p'j) is the probability of outcome j in experiment one (two). As a special case, we consider two random 
variables x and y and we set p to be the joint probability distribution, p = p(x, y) and p' the product of the marginal 
probability distributions, p' = Pi{x)p2(y)- In this particular case, the KL mutual information is usually referred to as 
mutual information (we will denote it as I{x, y)) and it is a useful and natural tool to measure the degree of statistical 
dependence between two random variables also applied in financial analysis p^. 

When considering traded volume, and since we are dealing with correlated non-linear processes, a natural way of 
generalising the KL mutual information can be achieved by replacing the usual statistical theory for the non-extensive 
statistical theory [l3l |. In this generalisation, the usual logarithm must be replaced by the g- logarithm, defined as 



ln„ 



When g ^ 1, the g-logarithm becomes the usual one. 



For q > 0, there exist well defined minimum and maximum values of Iq{p,p') corresponding to minimum and 
maximum dependence degrees between random variables, e.g., x and y. This allows us to define a criterion for 
statistical testing 15'] through the normalised quantity R = -j4^, £ [0,1]- Its extreme value R = (R = 1) 

q 

corresponds to zero (full) dependence between x and y. Given x and y, the ratio R can be calculated as a function of 
q. Typically, R varies smoothly and monotonically from to 1, its two limiting values. The inflexion point in R{q) 
determines the value of q for which R most sensibly detects changes in the correlation between x and y. We call this 
value of q as optimal value, q°^. It can be seen [15[ that for one-to-one dependence we have q°P = 0, and q°P = oo for 
total independence. 

The generalised mutual information R has already been used in [3] applied to traded volume time series from 
the components of the Dow Jones 30 index. In order to compare this quantity with the self-correlation function, 
we have considered x to be the time series and y the same time series with a lag in time, t = T. Here, we have 
further analysed this data by performing this calculation on the same lines as in section [U i.e., we have defined 
our random variables by modifying the (normalised detrended) traded volume v through exponents a and /3, i.e., 
Xa = v'j and yp — Vj . Then, we have computed R with the same exponents as in the section |lll Our procedure 
can be summarised as follows: We have first derived the probability distributions for each component time series i 
and its lagged counterpart with lag r. To construct the PDFs, we have set the bin size (or, in physics terms, the 
coarse-graining) to be Ax — Ay = 0.02, Vi. We then have calculated Ri as a function of the index q. From this, we 
have extracted an optimal index q°^(T, a, (3) for each component i. Finally, we have computed the mean q°P value 

from all 30 components, i.e. q°P = li^i'^^ ct^P) ^■ 

In Fig. [2] we present our results for different values of a and (3. Firstly we plot, for comparison purposes, the 
unmodified case a = l,/3 — 1 (panel a). There is a clear logarithmic dependence of q°^ as a function of the lag r. 
In the same panel we plot our results for a = 2, f3 = 1, and its symmetric case a — l,/3 — 2. We obtain again a 
logarithmic behaviour, but both additive and multiplicative fitting parameters change. The rate of change is higher 
indicating that this particular choice of a and f3 accelerates the loss of dependence in time. For a — 1, (3 — 2, 
the multiplicative parameter is very close to a = 2,/3 = 1 case (see caption), reflecting the same kind of symmetry 
observed in the section[lll In panel c we present results where a > 0, /3 < or a < 0, /3 > 0. Our results show that, in 
this case, q°P diminishes as a function of the lag, but the rate of change is not as high as in the a > 0, (3 > case (see 
caption) . Note that this result occurs for the same exponents where anti-correlated behaviour is found (section |TT| 
suggesting that anti-correlation might imply on negative slope in the logarithmic behaviour. This possibility will be 
verified in future work, namely on the analysis of the dependence between volatility and traded volume [13] ■ 

To further analyse the meaning of these results, we have performed the same calculations on a shuffled version of 
the same time series, i.e. applying a random reordering (in time) on each component time series. We show our results 
in Fig. [21 panels b and d, where we have use the same exponents as in panels a and c respectively. This shuffling 
procedure destroys causality and in every case q°P looses its dependence with r. In all cases, the curves obtained 
from the unshuffled data evolve towards the shuffled ones, probably reaching them for high values of r. Thus, one 
can consider that q°P obtained from the shuffled time series act as saturation values of the unshuffled case, when all 
dependence is lost. 



IV. FINAL REMARKS 



To conclude, we have applied a generalised form of correlation function, Ca,f3{.), in order to evaluate how values 
having different magnitudes influence each other. The results obtained point out that small values of traded volume 
are consistently anti-correlated with frequent and large values. Moreover, frequent and large values are positively 
correlated. These results are in accordance with the strong asymmetry of the multi-fractal spectrum, which has 
supported our dynamical scenario for the observable 0, • 

We also have investigated the effect of modifying our data through a and /? on the KuUback-Leibler generalised 
mutual information, and its associated optimal index q°P. Our results show that there is a logarithmic dependence 
of q°^ with lag in the positive exponents case {a > 0, [3 > 0), with different fitting parameters depending on these 
exponents. In the case of negative a or (3, we have observed that q°P diminishes with lag. A further analysis on this 
intriguing behaviour is certainly welcome. 

We thank C. Tsallis for several conversations on the subjects treated along this manuscript. SMDQ aknowledges 



^ Although it has been proved that statistical features depend on the liquidity our averaging is completely justified since our companies 
present trading values (per minute) within the same class llSH . 
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FIG. 2: Optimal index versus lag r. Panel a: Lines correspond to fitting function — A + B log r, where {A, B) 
is (1.667 ± 0.003, 0.035 ± 0.001) (solid line), (1.563 ± 0.004, 0.035 ± 0.001) (daslied line) and (1.583 ± 0.001, 0.0223 ± 0.0003) 
(dotted line) for {a, (3)— (2,1), (1, 2) and (1, 1) respectively. Panels b: Same as in a on the shuffled version of the time 
series. Constant values are: 1.954 ± 0.002 (solid line), 1.858 ± 0.001 (dashed line) and 1.7713 ± 0.0008 (dotted line). Panel c: 
Logarithmic fitting as in a: {A, B) is (1.977 ± 0.004, -0.007 ± 0.001) (solid line), (1.974 ± 0.008, -0.009 ± 0.002) (dashed line), 
(2.07 ± 0.01, -0.004 ± 0.002) (dotted line) for (a,/3)= (2,-1), (1, -1) and (-1, 1) respectively. Panel d: Same as in c on the 
shuffled version of the time series. Constant values are: 1.881 ± 0.002 (solid line), 1.869 ± 0.002 (dashed line) and 1.953 ± 0.005 
(dotted line). 
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